feat(orm): add fuzzy search and relevance ordering (PostgreSQL)#2573
feat(orm): add fuzzy search and relevance ordering (PostgreSQL)#2573docloulou wants to merge 4 commits intozenstackhq:devfrom
Conversation
… only) - Introduced fuzzy search operators (`fuzzy`, `fuzzyContains`) in the ORM. - Added `RelevanceOrderBy` type for sorting based on fuzzy search relevance. - Implemented fuzzy search filters in PostgreSQL dialect. - Added error handling for unsupported fuzzy search features in MySQL and SQLite dialects. - Updated Zod schema factory to include fuzzy search fields. - Created a new `Flavor` model in the schema for testing purposes.
📝 WalkthroughWalkthroughAdds fuzzy text-search operators and relevance-based ordering: types and Zod schemas updated, base dialect builder extended with abstract hooks, Postgres implements fuzzy/relevance, MySQL/SQLite explicitly reject the features, plus e2e Postgres tests and a test-only Flavor model. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
packages/orm/src/client/crud/dialects/postgresql.ts (1)
561-590: Well-implemented PostgreSQL fuzzy search using pg_trgm.The implementation correctly uses:
- Trigram similarity operator (
%) forfuzzy- Word similarity operator (
<%) forfuzzyContainswith proper operand orderingGREATEST()aggregation for multi-field relevance scoringThe use of
sqltemplate tags is appropriate here as these are PostgreSQL-specific operators not available in Kysely's type-safe API. Thesqltemplate is Kysely's escape hatch mechanism.Note: Extension dependencies (
pg_trgmandunaccent) are already documented in the type definitions (crud-types.ts). Consider adding runtime error handling if extensions are missing, similar to thecreateNotSupportedErrorpattern used for MySQL/SQLite, to provide users with a clearer message instead of a generic PostgreSQL error.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/orm/src/client/crud/dialects/postgresql.ts` around lines 561 - 590, Add runtime checks for the required PostgreSQL extensions and throw a clear user-facing error if missing: implement an internal check (e.g., ensurePostgresExtensionsAvailable) that queries pg_extension for 'pg_trgm' and 'unaccent' and call it from the PostgreSQL dialect initialization or lazily before using fuzzy features; update buildFuzzyFilter, buildFuzzyContainsFilter, and buildRelevanceOrderBy to call this check (or ensure it's called beforehand) and throw a createNotSupportedError-style error with a clear message and remediation steps if either extension is absent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/orm/src/client/crud-types.ts`:
- Around line 912-930: Update the RelevanceOrderBy type and its JSDoc to match
runtime behavior: change the _relevance.fields type from plain array to a
NonEmptyArray<NonRelationFields<Schema, Model>> so an empty fields list is
rejected at the type level, and revise the comment for _relevance to indicate
that relevance uses PostgreSQL similarity() (and that MySQL is not supported /
throws NotSupported at runtime) so IntelliSense reflects actual provider
constraints; locate the RelevanceOrderBy type and the _relevance field
declaration to make these edits.
In `@packages/orm/src/client/crud/dialects/base-dialect.ts`:
- Around line 1110-1131: The _relevance branch adds complex ordering but cursor
pagination still assumes simple {field: 'asc'|'desc'} entries; update handling
so cursor with a _relevance order is either rejected early or supported: modify
the code path that constructs cursor filters (function buildCursorFilter) to
detect order entries where field === '_relevance' (created via
buildRelevanceOrderBy / buildFieldRef / negateSort) and generate a comparison
that first compares computed relevance (value.search against the same fields)
then applies a deterministic tie-breaker (e.g., primary key) in the same sort
direction, or alternatively throw a clear validation error when a cursor is
supplied alongside an _relevance order; ensure tests cover both rejection and
correct SQL generation if you implement support.
In `@packages/orm/src/client/zod/factory.ts`:
- Around line 1180-1192: The _relevance.fields enum is currently built from all
scalar fields (scalarFieldNames) which allows non-string types; change the
scalarFieldNames computation in the getModelFields/filter pipeline to include
only string-typed scalar fields (e.g., filter by the field metadata indicating
type === 'String' or equivalent in your field definition) so that
_relevance.fields contains only string fields, and keep the z.enum(...) usage
but fed from the new string-only scalarFieldNames; update the code around
getModelFields, scalarFieldNames, and the _relevance strictObject construction
to reflect this restriction.
---
Nitpick comments:
In `@packages/orm/src/client/crud/dialects/postgresql.ts`:
- Around line 561-590: Add runtime checks for the required PostgreSQL extensions
and throw a clear user-facing error if missing: implement an internal check
(e.g., ensurePostgresExtensionsAvailable) that queries pg_extension for
'pg_trgm' and 'unaccent' and call it from the PostgreSQL dialect initialization
or lazily before using fuzzy features; update buildFuzzyFilter,
buildFuzzyContainsFilter, and buildRelevanceOrderBy to call this check (or
ensure it's called beforehand) and throw a createNotSupportedError-style error
with a clear message and remediation steps if either extension is absent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5c89d679-7173-415f-83ce-5738308b98ee
📒 Files selected for processing (12)
packages/orm/src/client/constants.tspackages/orm/src/client/crud-types.tspackages/orm/src/client/crud/dialects/base-dialect.tspackages/orm/src/client/crud/dialects/mysql.tspackages/orm/src/client/crud/dialects/postgresql.tspackages/orm/src/client/crud/dialects/sqlite.tspackages/orm/src/client/zod/factory.tstests/e2e/orm/client-api/fuzzy-search.test.tstests/e2e/orm/schemas/basic/input.tstests/e2e/orm/schemas/basic/models.tstests/e2e/orm/schemas/basic/schema.tstests/e2e/orm/schemas/basic/schema.zmodel
- _relevance.fields restreint aux champs String dans le schéma Zod - Rejet du cursor pagination combiné avec _relevance ordering - Type RelevanceOrderBy restreint aux StringFields avec tuple non-vide - JSDoc mis à jour pour refléter le support PostgreSQL uniquement
|
Hi @docloulou , thanks for this amazing PR, very useful feature and well implemented! I'm wondering if you're fine with delaying it to release v3.7 or 3.8. Asking this because, although not directly related, it's a bit odd to support fuzzy search but not regular full text search (a feature gap from Prisma). I hope to get FTS implemented, probably in 3.7, and we can have this feature either together or in a subsequent minor release. What do you think? |
|
No problem for me. If the code in this PR looks solid to you, it can serve as a good template for adding the FTS feature. The main things left to handle would be adding the Note: one thing to watch out for - in this PR I’m using _relevance (as Prisma does for FTS) for the fuzzy search, so there could be a conflict. |
|
I'm really loving the idea of this! I would definitely use this a lot :D |
Yes, it'll definitely help FTS implementation. Maybe we can share the _relevance field if conflict can very carefully avoided. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/orm/src/client/crud-types.ts`:
- Around line 906-922: The _relevance.fields type currently allows enum-backed
string literal unions because StringFields is based on MapModelFieldType
returning any string-like type; change the typing so only true free-form string
columns are allowed (i.e. require MapModelFieldType<Schema, Model, Key> to be
exactly the primitive string type, not a string-literal union or enum type) and
use that stricter alias in RelevanceOrderBy._relevance.fields; align this
compile-time guard with the runtime check implemented in the zod factory (the
string-only schema logic around factory.ts handling of String) by
creating/renaming StringFields to something like StringFieldsOnlyString and
swapping it into RelevanceOrderBy._relevance.fields.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4df1bb7c-a65b-44f5-8926-58fcd6c21dee
📒 Files selected for processing (6)
packages/orm/src/client/crud-types.tspackages/orm/src/client/crud/dialects/base-dialect.tspackages/orm/src/client/crud/dialects/mysql.tspackages/orm/src/client/crud/dialects/postgresql.tspackages/orm/src/client/crud/dialects/sqlite.tspackages/orm/src/client/zod/factory.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/orm/src/client/crud/dialects/postgresql.ts
- packages/orm/src/client/zod/factory.ts
| type StringFields<Schema extends SchemaDef, Model extends GetModels<Schema>> = { | ||
| [Key in NonRelationFields<Schema, Model>]: MapModelFieldType<Schema, Model, Key> extends string | null | ||
| ? Key | ||
| : never; | ||
| }[NonRelationFields<Schema, Model>]; | ||
|
|
||
| export type RelevanceOrderBy<Schema extends SchemaDef, Model extends GetModels<Schema>> = { | ||
| /** | ||
| * Sorts by fuzzy search relevance using PostgreSQL `similarity()` from `pg_trgm`. | ||
| * Not supported on MySQL or SQLite (throws `NotSupported` at runtime). | ||
| * Cannot be combined with cursor-based pagination. | ||
| */ | ||
| _relevance?: { | ||
| /** | ||
| * String fields to compute relevance against (must be non-empty). | ||
| */ | ||
| fields: [StringFields<Schema, Model>, ...StringFields<Schema, Model>[]]; |
There was a problem hiding this comment.
_relevance.fields is currently typed too broadly.
StringFields is derived from MapModelFieldType, so enum scalars also match here because their TS shape is a string-literal union. That means _relevance.fields accepts enum columns at compile time, while the runtime schema in packages/orm/src/client/zod/factory.ts:1304-1316 only allows fields whose schema type is String.
Suggested fix
type StringFields<Schema extends SchemaDef, Model extends GetModels<Schema>> = {
- [Key in NonRelationFields<Schema, Model>]: MapModelFieldType<Schema, Model, Key> extends string | null
+ [Key in NonRelationFields<Schema, Model>]: FieldIsArray<Schema, Model, Key> extends true
+ ? never
+ : GetModelFieldType<Schema, Model, Key> extends 'String'
? Key
: never;
}[NonRelationFields<Schema, Model>];🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/orm/src/client/crud-types.ts` around lines 906 - 922, The
_relevance.fields type currently allows enum-backed string literal unions
because StringFields is based on MapModelFieldType returning any string-like
type; change the typing so only true free-form string columns are allowed (i.e.
require MapModelFieldType<Schema, Model, Key> to be exactly the primitive string
type, not a string-literal union or enum type) and use that stricter alias in
RelevanceOrderBy._relevance.fields; align this compile-time guard with the
runtime check implemented in the zod factory (the string-only schema logic
around factory.ts handling of String) by creating/renaming StringFields to
something like StringFieldsOnlyString and swapping it into
RelevanceOrderBy._relevance.fields.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/server/test/openapi/baseline/rpc.baseline.yaml`:
- Around line 4326-4329: The OpenAPI generator is advertising PostgreSQL-only
search operators (fuzzy, fuzzyContains, and _relevance) unconditionally; update
the generator in packages/orm/src/client/zod/factory.ts so that the schema
properties for "fuzzy", "fuzzyContains" and "_relevance" are only emitted when
the datasource provider is Postgres (or when the required extension is enabled),
otherwise either omit them or add an explicit description/enum stating
"PostgreSQL-only / extension required"; locate the emission logic around the
existing provider-agnostic blocks (see the sections you flagged near lines
~1025-1035 and ~1304-1316) and add a provider check (or description branch)
before adding those properties to the OpenAPI schema.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 456d6bee-273c-4080-a6e3-e57912084b89
📒 Files selected for processing (1)
packages/server/test/openapi/baseline/rpc.baseline.yaml
| fuzzy: | ||
| type: string | ||
| fuzzyContains: | ||
| type: string |
There was a problem hiding this comment.
Don't advertise PostgreSQL-only search operators unconditionally.
These additions make fuzzy/fuzzyContains and _relevance part of the generated OpenAPI contract, but the implementation is PostgreSQL-only per the PR and throws NotSupported on MySQL/SQLite. The linked generator logic in packages/orm/src/client/zod/factory.ts:1025-1035 and packages/orm/src/client/zod/factory.ts:1304-1316 also appears provider-agnostic, so non-Postgres projects will expose operations that cannot succeed at runtime. Please gate these schema properties on the datasource provider, or at least emit explicit PostgreSQL-only / extension-required descriptions so generated clients are not misled.
Also applies to: 5141-5163
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/server/test/openapi/baseline/rpc.baseline.yaml` around lines 4326 -
4329, The OpenAPI generator is advertising PostgreSQL-only search operators
(fuzzy, fuzzyContains, and _relevance) unconditionally; update the generator in
packages/orm/src/client/zod/factory.ts so that the schema properties for
"fuzzy", "fuzzyContains" and "_relevance" are only emitted when the datasource
provider is Postgres (or when the required extension is enabled), otherwise
either omit them or add an explicit description/enum stating "PostgreSQL-only /
extension required"; locate the emission logic around the existing
provider-agnostic blocks (see the sections you flagged near lines ~1025-1035 and
~1304-1316) and add a provider check (or description branch) before adding those
properties to the OpenAPI schema.
Summary
fuzzyandfuzzyContainsfilter operators for String fields inwhereclauses, using PostgreSQL'spg_trgmextension withunaccentfor accent-insensitive trigram matching_relevanceordering inorderByto sort results by fuzzy similarity score, supporting single and multiple fieldsNotSupportederrors for these operatorsNew API
Prerequisites (PostgreSQL)
The user must enable the following extensions in their PostgreSQL database:
Files changed
packages/orm/src/client/crud-types.tsfuzzy,fuzzyContainsinStringFilter;RelevanceOrderBytypepackages/orm/src/client/constants.tsFuzzyfilter kind inFILTER_PROPERTY_TO_KINDpackages/orm/src/client/crud/dialects/base-dialect.tsfuzzy,fuzzyContains,_relevancein filter/orderBy builders; 3 abstract methodspackages/orm/src/client/crud/dialects/postgresql.tspg_trgm+unaccentimplementation (%,<%,similarity(),GREATEST())packages/orm/src/client/crud/dialects/mysql.tsNotSupportederrorspackages/orm/src/client/crud/dialects/sqlite.tsNotSupportederrorspackages/orm/src/client/zod/factory.tsfuzzy,fuzzyContains,_relevancetests/e2e/orm/schemas/basic/schema.zmodelFlavormodeltests/e2e/orm/client-api/fuzzy-search.test.tsImplementation details
fuzzyfilterUses PostgreSQL trigram similarity operator
%withunaccentandlowerfor accent-insensitive, case-insensitive matching:fuzzyContainsfilterUses PostgreSQL word similarity operator
<%to check if the search term is approximately contained as a substring:_relevanceorderingUses
similarity()function for single fields,GREATEST()for multiple fields:Test plan
_relevanceordering (single field, multiple fields, with pagination)Documentation : zenstackhq/zenstack-docs#596
Summary by CodeRabbit
New Features
Database Support
Tests
Documentation